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Abstract: Compressed sensing theory is slowly making its way to solve more and more astro¬ 
nomical inverse problems. We address here the application of sparse representations, convex opti¬ 
mization and proximal theory to radio interferometric imaging. First, we expose the theory behind 
interferometric imaging, sparse representations and convex optimization, and second, we illustrate 
their application with numerical tests with SASIR, an implementation of the FISTA, a Forward- 
Backward splitting algorithm hosted in a LOFAR imager. Various tests have been conducted in 
Garsden et al., 2015. The main results are: i) an improved angular resolution (super resolution of 
a factor ~ 2) with point sources as compared to CLEAN on the same data, ii) correct photometry 
measurements on a field of point sources at high dynamic range and iii) the imaging of extended 
sources with improved fidelity. SASIR provides better reconstructions (five time less residuals) of 
the extended emission as compared to CLEAN. With the advent of large radiotelescopes, there is 
scope for improving classical imaging methods with convex optimization methods combined with 
sparse representations. 
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1. Introduction 


The emergence of the new sampling theorem known as Compressed Sensing [[]]. 0 enabled to 
build, along with convex optimization and sparse representations, a solid framework to solve a 
variety of inverse problems (such as imaging) which are daily found in medical imaging, biology 
and recently in astronomy (e.g. [0]). In this framework, new methods appeared for robust and fast 
image reconstruction which are crucial when using giant radio interferometers such as LOFAR and 
SKA. 

We present here a summary of the basics in radio interferometry in Section 0, from the mea¬ 
surement with an antenna (§ |2.1|) to the method used to produce radio maps (§ |2.2|) . In Section 0, we 
introduce the notion of sparse representation (§|3. ID and one current method to solve convex opti¬ 
mization problems (§|3.2|). In Section 0. we formulate the imaging problem as an inverse problem 
in the CS framework ( §0), and we illustrate the principle of reconstruction with some numerical 
tests made in the scope of radio imaging with modern interferometers (§ |4.2| ). 

2. Principles of radio interferometry 

2.1 From a single dish to antenna arrays 

2.1.1 Limitation of single dish observations 

A radio antenna is an electromagnetic transducer which converts electromagnetic waves (and often 
keeping the information about polarization) into current/voltages signal which then can be ac¬ 
quired, processed and sampled. It can be simply represented by a thermometer which thermalizes 
with a brightness temperature over a certain region of the sky, the antenna Field Of View (FoV). 
Depending on its geometry and on the observing wavelength, each antenna has a directional gain 
pattern (or beam) which describes how the antenna response varies across the FoV, away from the 
pointing center. This pattern also varies with time as well as with the electrical conditions of the 
antenna surroundings. Classical (directive) radio antennas have a paraboloid reflector to condense 
the radio signal collected over an effective area 1 (A e ff) in an unique focal point. The beam pattern 
can be decomposed in the main lobe, which defines the angular resolution of the observation, and 
side lobes, in which radio frequency interferences (RFI) can alter the astrophysical signal. Because 
there is a link between the measured Point Spread Function (PSF) of the telescope and the aperture 
diameter D ( u s h (through the Wiener-Khintchine theorem), a large collecting area brings a higher 
angular resolution 8 6 on the sky following the well-known relation 8 6 °< ; . 

These directive antennas are mechanically steerable and their mechanical operability and resis¬ 
tance are the main factors which limit their maximum size. The Green Bank Telescope (GBT) and 
the Radio Telescope Effelsberg (RTE) are the largest steerable dishes (of ~ 100m in diameter) that 
can be built with metallic structure. To achieve higher angular resolution and a better sensitivity, 
the size of the reflector must be increased, and a straightforward way to build such instrument is to 
rely on natural formations (e.g. Arecibo Observatory) or to divide them in manageable pieces (e.g. 
Nanqay Radio Telescope). In the case of the Arecibo radiotelescope, different pointing directions 
are achievable by moving the focus point itself over the surface of the reflector(s). 

1 which possibly differs from the physical collecting area 
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2.1.2 Antenna arrays 

The construction of huge instruments as depicted above is itself limited by construction constraints 
and maintenance costs. Consequently, combining antennas of manageable size was proposed very 
early as a way to increase both angular resolution and sensitivity. Using radio antennas in arrays 
is not new, and was used since the early age of radio astronomy at very low frequencies by Karl 
Jansky 2 and Grote Reber who pioneered the observation of the first astrophysical radio sources. 

The principle of antenna arrays is to extrapolate the Wiener-Khintchine theorem to sparsely 
filled apertures. It is a way to pave a larger artificial aperture to inherit the properties of a larger 
radio telescope that can not be built in a single piece. By putting telescopes at different mutual 
distances and at a maximum distance D max , one can mimic the effect of observing with a single 
dish telescope of aperture D max . A maximum angular resolution Sd max °< jy- can be achieved but 
the sensitivity scales only with the sum of the effective areas of the different telescopes rather than 
with the full area covered by the synthetic aperture. 

There are two main ways to use radio antennas in an array: 

• in a phased array: all antenna signals are summed together after compensating for the rel¬ 
ative geometric delays created between the antennas by observing in a particular direction. 
The incoming EM signal reaches the antenna at various times and will be temporally “di¬ 
luted” if no phasing is performed. The summed signal results in an electronic beam pointed 
in the phasing direction. For an array of directive antennas (e.g. dishes), the phasing direc¬ 
tion must match the pointing direction commanded to each individual antenna. For an array 
of undirective, wide FoV, antennas (e.g. dipole over the ground), an electronic, synthetic and 
directive beam is formed and pointed toward the direction of interest (i.e. the direction of 
phasing). The phasing stage can be done chromatically by inserting phase differences A(j)-' x 
to the signal of antenna i, (at time t and frequency v) or achromatically using true time delays 
At- (e.g. coax cable lengths, delay lines). This resulting “phased” sum of the antenna signals 
improves the sensitivity by a factor of 1 IN ant and the electronic beam becomes the PSF of 
the array. 

• in an interferometer: after performing the same delay compensation on each antenna signal, 
the signals are correlated against each other, averaged in time and frequency and stored for 
each baseline (i,j) formed by antenna i and j. Each baseline produces a fringe pattern mul¬ 
tiplied by the antenna beam pattern (asuming that antennas i and j have identical electrical 
properties, which is not true in reality) to form the two-antenna interferometer PSF. After 
combination with all other baselines, the PSF of the whole instrument can be derived and 
corresponds to the PSF of the equivalent phased array. 

There is a continuity between single dish radio telescopes and the filling of an artificial aperture 
with small aperture telescopes. Indeed, one can inteipret a single paraboloid dish as a continuous 
phased array or interferometer of antennas [ 0 ] which are distributed along a parabola. This shape 
possesses only one focal point and acts as a natural phasing system for signals coming from the 
direction 6 and reflecting on the different parts of the reflector. Any carved hole in the parabolic 

2 from which the flux density unit used in radio astronomy is defined: 1 Jansky = 1 Jy = 10~ 26 W.m~ 2 .Hz~* 
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aperture will affect the distribution of sidelobes in the PSF but will not affect the maximum achiev¬ 
able angular resolution (in first approximation). By continuity, one can see an array (equipped 
with a proper phasing system) as a huge aperture mainly tilled with holes but providing a similar 
angular resolution. This concept is called aperture synthesis, and radio interferometry relies on 
this principle to make imaging possible. The distribution of the antennas in the array is therefore 
important as it has a direct effect on the shape of the instrumental PSF. In practice, various effects 
alter the effective PSF of the array (effect of other antenna, spill-over, efficiency, mutual coupling, 
mechanical stability, presence of the ground, reflection, diffraction effects, ...). Examples of inter¬ 
ferometers are the Very Large Array (VLA, NRAO), the Westerbork Synthesis Radio Telescope 
(WSRT), the Low Lrequency Array (LOLAR) [@|, the Long Wavelength Array (LWA) [0] and in 
the future, the Square Kilometre Array (SKA) [®. Example of phased arrays are the Nan£ay De¬ 
cametre Array (NDA) 3 , any antenna field of a LOLAR station, the Murchison Widefield Array tiles 
(MWA) [0] and the incoming NenuLAR array [HU ITTH . LOLAR is an instrument which couples 
the two concepts of arrays 4 : at the station level, antenna signals are combined in phased arrays, 
at the interferometer level, signal from stations can either be summed as a giant phased array, or 
correlated as a giant interferometer. 

2.2 Aperture synthesis with interferometry 
2.2.1 What is measured by an interferometer? 

An interferometer is defined as a collection of antenna baselines formed with theoretically identical 
antennas. Each baseline bjj will produce sets of correlation measurements at different times t 
and frequencies v. Correlation between two antennas signal can be seen as the Young’s holes 
interference experiment in optics. The Wiener-Khintchine theorem links the beam pattern to the 
Lourier transform (LT) of the aperture autocorrelation and is identical to the fringe pattern created 
by two coherent and in-phase point sources. In the scope of a planar array of two antennas, the 
beam pattern is a linear fringe pattern projected on the sky with geometry depending on the time, 
frequency, and the perpendicular projection of the baseline w.r.t. the direction of interest. An ideal 
interferometer samples the sky in the Lourier domain [Oh As a classical antenna or antenna array 
samples the sky brightness temperature, an interferometer will sample the LT of the sky brightness 
through the measurement of spatial frequencies. One baseline (projected against the normal of the 
pointing direction) gives access to one spatial frequency of the sky brightness B. Short baselines 
sample small spatial frequencies associated with large scale structures in the sky. Conversely, large 
baselines provide information on the small scale structures in the sky. The maximum baseline 
limits the maximum angular resolution of the observation at X with 8 0 max <=< jX-. 

In a simplified framework, for a coplanar array in a small field approximation, the measured 
quantity between antennas (i,j) is given by Eq. [2.1[ . 

Vi j = [ B(r)exp(—2i7Cbij.r/X)d£l (2.1) 

—- JFoV 

Vij is the complex (fringe) visibility, dQ. the solid angle element, B(r) is the sky brightness 
centered around the phase center direction r (defining a projected 2D plane with variables /, m and n 

3 http://www.obs-nancay.fr/-Le-reseau-decametrique-.html 

4 For more information on LOFAR, see www.lofar.org 
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the direction cosines taking their origin at the pointing center), exp(—2 nbij.r/X) can be expressed 
analytically using the coordinates (u,v,w) and (l,m,n) which are Fourier pairs through the simple 
FT in Eq. [TT]. 

By neglecting the third component w in Eq. o in the small field approximation, each spatial 
frequency can be represented as a unique point in the “u-v” plane. This plane is then the FT of the 
sky. The more samples we have in the u-v plane, the more complete the knowledge of the sky FT. 
The limited number of antennas limits the number of measured spatial frequencies at a given time 
and frequency. More visibility data can be obtained by observing in various frequency bands and 
at various times (e.g. Earth Rotation Synthesis). Even with a perfect data calibration, the limited 
amount of measurements distorts the image of the sky, making the imaging problem an inverse 
problem. Eq. can be also seen as the integral of the product of the sky brightness B (over the 
antenna FoV) by a fringe pattern projected on the sky, as if the sky was seen through a Venetian 
blind. An interferometer is a set of spatial filters which measures the coefficient of the sky in a 2D 
Fourier basis of functions. 

In reality, the framework is much more complex and the data modeling as well as the real 
data calibration is performed using the Radio Interferometer Measurement Equation (RIME) [PL 
PL IT5tl which will not be detailed here (see [[Pi and references therein). This framework is up to 
now the most accurate and most general linear framework to model how a polarized radio signal 
is transformed from its source to its measure in the form of visibilities. All effects are modeled by 
Jones matrices which serve to express direction-independent effects (electronic gains, array phase, 
clock shift and drifts, etc.) and direction-dependent effects (antenna beam pattern, ionosphere, 
polarization, etc.). Direction-(in)dependent calibration of radio data is an active field of research in 
the scope of modern radio interferometry. 

Given a discrete and discontinuous set of visibilities, which are the true measurements given 
by an interferometer, the task is to inverse the problem to recover the full original FT of B and 
obtain B through inversion. 

2.2.2 From visibilities to images 

As seen above, the imaging problem can be formulated as the inverse problem of Eq. [2.2[ . 

V=MFB+N (2.2) 

With V the measured complex visibilities, B the brightness distribution of the sky, F the Fourier 
operator to convert the sky brightness to its FT, M a masking operator which accounts for the 
unsampled regions of the FT in the data and N a generic additive noise which comprises all sources 
of noise. We assume here a perfectly calibrated set of visibilities V which constitutes our data. 
Knowing M and F, the imaging problem is to deduce B from data V. M is nothing more than the 
FT of the PSF and is often combined to a weighting function in the u-v plane. The latter enables 
some apodization of different visibilities depending on their noise and density, with the effect of 
modifying the instrumental PSF. In order to produce an image by an inverse 2D FT, the visibilities 
are first gridded using convolution functions on a regular grid. A raw inverse FT of the gridded 
visibilities will result in a highly distorted image, called the dirty image, which is the convolution 
of the sky brightness with the instrumental PSF, the dirty beam. It is very unlikely that any relevant 
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scientific information could be retrieved from the dirty image. It is necessary to perform a set of 
additional operations to produce a scientifically usable image. Consequently, the problem reduces 
to a deconvolution problem with the goal to remove the effect of the instrumental response from 
the data. 

2.2.3 Deconvolution with CLEAN 

The CLEAN algorithm 1IT7I1 is a simple and efficient deconvolution algorithm which was used 
for «40 years by the radio community. It proposed an iterative approach which performs source 
detection and fractional PSF subtraction on the dirty image. At each iteration, it searches for the 
maximum peak location and subtracts a fraction (generally sslO%) of the PSF scaled to the detected 
peak maximum. It jointly fills a map with the detected peaks, known as CLEAN components, 
whichs leads at the end to the model map containing detected sources in the dirty image. The 
algorithm stops when a threshold is reached or when the maximum number of iterations has been 
performed. The dirty beam main maxima is then fitted with a 2D elliptical gaussian which serves as 
a regularization function, the CFEAN beam. The final CLEAN component map, the model image, 
is convolved by this CLEAN beam and added to the residual map to form the restored image. As 
the model image is usually not directly usable for science, it is convolved with the CLEAN beam 
which accounts for the actual angular resolution accessible with the observation. In addition, the 
residual map is combined to the results to restore potentially missed background emission (e.g. 
diffuse extended emission). 

Many improvements of the CLEAN algorithm were developed HTSU to enhance the original 
version in terms of computational effort, accuracy, and fidelity w.r.t. the source characteristics (e.g. 
Multiscale CLEAN HIPU L They generally improve the quality of the restored images by building 
better models of the sky and images with lower residuals. One of the latest version of CLEAN- 
based algorithms is the Multi-Scale Multi-Frequency Synthesis (MS-MFS) lEZDH algorithm which 
accounts for the extended features of the source as well as its dependence in frequency. Other 
deconvolution algorithms were also developed, and are based on statistical approaches I [HI . [221 l. 
We will focus in the following, on the methods based on sparsity. 

2.2.4 Limitations of classical imaging 

With modern interferometers such as LOFAR and SKA, simplifying hypotheses do not hold any 
longer such as the small field and the coplanar array approximations [Ql . At low frequencies 
(<300 MHz), the antenna beam pattern widens and becomes sensitive to a larger number of radio 
sources which will impact the visibilities. Correct calibration requires an advanced knowledge of 
the sky and often necessitates wide-field imaging [[24]]. On the one hand, the variation of the beam 
pattern with the direction (and with polarization) must be taken into account and, on the other 
hand, especially for continent-sized arrays, the curvature of Earth is no longer negligible and the 
projected baseline will have a third component, w. These two constraints destroy the simple FT 
relationship of Eq. |2.1| and Eq. These can be addressed using respectively the A-projection 
IE27II and W-projection lE26ll (or more recently W-Stacking 112711). 

Time and frequency integration can improve the quality and Signal-To-Noise (SNR) ratio of 
the image to a certain extent but an upper limit exists when smearing effects kill the correlation 
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starting from the largest baselines. Long integrated images then require averaging images pro¬ 
duced in smaller time windows (and smaller bandwidths) which provide enough SNR to perform 
deconvolution. Conversely, fast imaging will be limited by the amount of available data and there¬ 
fore, by the noise level of the measured u-v components. However, a fast imaging mode is required 
when observing transient radio sources, therefore, the capability of detecting those transients will 
be limited by the noise level in the snapshot images. The accuracy, fidelity and SNR of the im¬ 
ages and the required timescale to produce them put a limit on the minimum transient detection 
timescale. 

There is scope for improving the image reconstruction algorithms by using recent results pro¬ 
duced in the image/signal processing field. The application of the methods presented in Section 0 
represents an important milestone in the field of radio interferometric imaging by aperture synthe¬ 
sis. 


3. Principles of sparse image reconstruction 

3.1 Sparse representation 

3.1.1 Definition 

A discrete signal x can be represented by Eq. .1 [ . 

N 

x=Y j ai$i (3-1) 

i '=0 

where the a,- are the coefficients of x in some space of representation <t>, or dictionary, composed of 
the atoms (j)j. The signal x is often expensive to store and there is interest of grasping the essential 
information about x at a minimum cost. One way to do this is to rely on sparse representations 
which enable data compression. A signal is strictly sparse when the family {a,} is almost null, 
e.g. most {a,} are null and x can be represented with a small number of coefficients. We define 
the support of x as the set of non-null coefficients of x, and we can define the zero pseudo-norm Iq 
as the cardinal of this support such as ||jt||r (l = card(supp(x)). Most natural signals are not strictly 
sparse but are “compressible” (or weakly sparse), i.e. the essential information is gathered in some 
coefficients and the variation of the coefficient amplitude ranked by decreasing order converges 
quickly to 0 according to a power law I [Dill. By defining a threshold on the coefficient amplitude, we 
can select only the most representative coefficients constituting x. Therefore x can be approximated 
efficiently by its sparse representation x. 

3.1.2 Dictionaries 

Depending on the nature of the signal, it is possible to find (and even build) a dictionary in which 
the signal x will have a relatively sparse representation. A dictionary is an indexed collection of 
atoms {</>,}, where the index can represent, a position (Dirac), a frequency (Fourier) or a scale (e.g. 
Wavelets). A dictionary can be represented as a matrix where the columns are the atoms. The 
choice of the correct dictionary is critical and for a given signal, the “best” dictionary is the one 
which provides the sparsest description. For example, a time series signal made of the sum of 
three cosine waves of different frequencies will not have a sparse representation in the time domain 
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as a lot of coefficients are required to represent the signal with enough fidelity at each time step. 
However, in the Fourier domain, only three coefficients, corresponding to the three frequencies 
(knowing the parity of cosine function) are necessary to describe the signal entirely. Furthermore, 
a 2D image of the sky which only contains 1-pixel point sources will be sparse in the image domain 
(using Dirac functions to locate the non-null coefficient) but will not have a sparse representation 
in the 2D Fourier domain (where the information of the position is stored in the phase). 

For an updated review on sparse representations and dictionaries, refer to [ESI l29ll . 

3.2 Solving inverse problems with convex optimization and proximal methods 
3.2.1 Inverse problems as convex optimization problems 

Sparsity and sparse representations could develop throughout the fields of signal/image processing 
thanks to the emergence of a new sampling theorem, compressed sensing [Ql Q, EJ which offers 
an alternate to the Shannon sampling theorem. This new framework is extremely useful for formu¬ 
lating and solving ill-posed inverse problems. A linear inverse problem can simply be formulated 
with Eq. EH- 

y=(p{x) + n (3.2) 

y are the data, x is the unknown signal to recover, n is the noise and <p, the characteristic transform 
of the problem linking the wanted signal x to the available data y. <p operates as a degradation 
operator on x and appears in various kind of problems as defined in fl2%ll : 

• Deconvolution problem: where <p operates a convolution on x filtering away the high spatial 
frequencies. 

• Super resolution: <p is also a convolution plus a subsampling operation. 

• Inpainting: (p is a binary operator which masks a substantial amount of information of x. This 
is important for the interferometric imaging problem where only discrete spatial frequencies 
are sampled. 

• Compressed Sensing decoding: <p is an operator of random linear measurements of x assum¬ 
ing that x has a sparse representation in some dictionary <t>. 

When these inversion problems are ill-posed (i.e. we are not sure of the existence and of the 
unicity of the solution, nor if the solution is stable under change of initial conditions) and when a 
good solution for x is searched for, some prior knowledge on the structure of x is required to reduce 
the space of acceptable solutions. Classical regularization techniques can be applied (such as the 
Tikhonov regularization [EH] ) but sparse regularization techniques offer a wider range of methods. 
In the compressed sensing framework, three main conditions have to be met: i) the problem must 
be ill-posed, ii) a sparsifying dictionary must exist for x, iii) columns of <p and must be incoherent 
(the incoherence property is defined in [QTJ] ). 

The classical method for solving this problem is to find a solution x that makes <p(x) close to 
the data y. For this, we define a misfit function as |y — <p(x)||^ (where £2 is the Euclidian norm) 
for which we want to find a minimum. We can formulate the minimization problem as Eq. pO| 

min||y—<p(x)||| 2 . 
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(3.3) 


We introduce a new quantity &{x) that will put a penalty on unwanted solutions. The mini¬ 
mization problem then becomes Eq. |3.4| . 

min||y— 9 (x)||| +A^(x). (3.4) 

X 

Let’s assume that the signal x is k-sparse in a dictionary d> with coefficients {a,} = a. x can 
be written as in Eq. [3.2| , x = <Pcx and its 1$ pseudo-norm is ||a||^ 0 = k. In the context of inverse 
problems, we can transform Eq. |33] into Eq. [331 . 

min11a1s. t. ||y- (p(®a)\\e 2 < e (3.5) 

However, Eq. [33] is non convex and falls to a NP-Hard combination problem, because of the 
£ 0 pseudo-norm. We therefore introduce a convex relaxation by replacing the f'o pseudo-norm with 
the £\ norm (where 11or11/■, = L ; |«|). Eq. [33] reformulates in Eq. [33| which, under it Lagrangian 
form, expressed as Eq. 


mm||a||^ s. t. ||y- <p(®a)\\e 2 < £ (3.6) 

imnA||a||^ + ^||y —<p(«I>a)||| <£ (3.7) 

Which is of the same form as in Eq. M Here the regularization term & will enforce the sparsity 
of the coefficients a, . 

3.2.2 Proximal methods to solve convex problems 

The previous problem, because of the presence of the £\ norm, is a convex but non-differentiable 
problem. Therefore, we can not apply smooth optimization techniques. In the scope of the convex 
analysis, the proximal theory introduced by Moreau in [1331] 5 generalizes the notion of projection 
to a specific class of convex, lower semicontinuous functions for which we can find a minimum. 
Interested readers should refer to I [31 [3311 for an introduction on proximal theory. 

Problem [33] is of the form of Eq. [331 . 

minJfta) = ^(a)+&(a) (3.8) 

a 

where & and are not necessarily differentiable, not infinite everywhere and their domains 
have non-empty intersection [EBD - 

.73 as is a penalty put on the sparsity of the solution and is the data attachment term, or 
the data fidelity term, which promotes consistency of the solution with the input data. The use of the 
proximity operator helps us to determine a minimizer for Eq. [331 . The computation of a proximity 
operator for is hard and splitting methods can be used to evaluate the proximity operators of & 
and separately. Those methods split into three main classes 11341 13311 among which one can find 
the Forward-Backward splitting method which generalizes the classical gradient projection method 
as a constrained convex optimization ( 1 1361 . 13711 and reference therein). 

5 Original paper which launched proximity operators. 
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If Sf is differentiable (with a L-Lipschitz continuous gradient) and admits a proximal op¬ 
erator, then we can solve the problem with the Forward-Backward (FB) splitting method using the 
iterative method depicted in Eq. pT9] . 

x n +i = pmx Tn ^(x n - y„W(x„)) (3.9) 

where 0 < inf„ 7n < sup,,^ < 2/L. In Eq. 

4. Applying sparse reconstruction to radio interferometry 

4.1 Recasting the imaging inverse problem 

The link between imaging by aperture synthesis in radio and the use of convex optimization was 
already made in the literature in HTR1 . m HTi FTTL ITBl . The problem of radio interferometric imaging 
is an inverse problem as stated by Eq. [172] and as shown in Fig. [I]. 

In the context of the previous optimization framework, even the CLEAN algorithm can be in¬ 
terpreted as a matching-pursuit algorithm IIT21I using a dictionary with a single atom, the PSF. In the 
specific context of LOFAR (and SKA) and low frequency imaging, we proposed an implementation 
of this problem in the LOFAR imagers (AWimager I [2511) under the name Sparse Aperture Synthesis 
Image Reconstruction (SASIR) lUHl . With SASIR integrated in the imager, it becomes straight¬ 
forward to take into account the data format (Measurement Sets, the standard format of CASA 
software package [ED) containing simulated or real data, as well as introducing direction depen¬ 
dent corrections such as the effect of the antenna beam ( A-projection ) and of the non-coplanarity 
of the interferometer during wide-field imaging (W-projection). 

Theoretical introduction and numerical results on simulated and real LOFAR data are pre¬ 
sented in [U5|] h . Among the different approaches, we can cite MORESANE [FFTIl which uses a 
greedy approach and smart image modeling using the Starlet transform I ITT ETSTl and PURIFY ItTHl 
which also use sparse representation and convex optimization. Comparative tests are underway 
among the community of radio astronomers using the compressed sensing theory and a common 
testing framework, R.O.D.R.I.G.U.E.S. 6 7 [ET7I1 . is being developed in the scope of the SKA precur¬ 
sors and will be able to furnish the tools to compare in parallel the performance of different imagers 
using the same input data and inspecting the same figure of merit on the reconstructions. Hereafter, 
we describe the parameters and numerical results using SASIR only. 

4.1.1 Choosing a dictionary 

Wavelet decomposition has impacted signal/image processing and data compression in a profound 
way. Many different wavelets exist depending on the field and on the morphology of the signal 
(refer to |[29ll for an exhaustive overview of wavelets). For astronomy, the starlet transform (or 
Undecimated Isotropic Wavelet Transform - IUWT) has demonstrated to provide efficient decom¬ 
positions of extended complex sources while being computationally cheap. This transform operates 
as a spatial filter on the image. The number of decomposition scales must be adapted to the com¬ 
plexity and to the variety of spatial frequencies in the data. Some astrophysical structures present 

6 the source code is available at www.cosmostat.org 

Accessible at http://rodrigues.meqtrees.net/scheduler/ 
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Figure 1. Schematics of the inverse problem of imaging with y the visibility data, H the transform, combin¬ 
ing the Fourier transform and the masking operator to account for the unmeasured part of the (u,v) plane, 
and x which has a sparse representation of coefficients vector a in the dictionary <f>. 

filamentary features which can be better represented with curvelets [ffSil . An ideal dictionary would 
be a joint set of dictionaries which accounts for all the different kind of shapes in the sky. Such 
representation however requires advanced signal reconstruction methods, such as MCA or GMCA 

m. 

4.1.2 Choosing a minimization algorithm 

Our implementation of Eq. ^15] is based on the (Fast) Iterative Shrinkage Thresholding Algorithm - 
(F)ISTA EM . With the implementation of Eq. [3.2| . the forward step computes the gradient and the 
backward step evaluates the proximal operator of ||.||^ which reduces to a soft-thresholding step 
on the wavelets coefficients of x. Its implementation is simple and follows the algorithm [I] (see flTOl 
for details). Improvements of this algorithm include the implementation of a “reweighting” t\ lETH 
which reduces the bias of the solution. 

4.1.3 Choosing parameters 

In addition to the classical imaging parameters (number of pixels, pixel size on the sky, weighting 
scheme, ...), the gain and threshold definition has been updated in the context of convex optimiza¬ 
tion. The parameters /./ in Algorithm |T] is the relaxation parameter which impacts the convergence. 
jj. taken to 1.0 is generally satisfying but it should be reduced to slow down the convergence on 
complex data. The threshold parameter A is associated with the noise in the data and therefore 
is involved in the thresholding step of the wavelet coefficient. An improved thresholding scheme 
was adopted in SASIR and uses a different threshold A,- for each wavelet scale i. In each scale i of 
the reconstruction residuals R, we compute a,- using the Median Absolute Deviation (MAD) as a 
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Algorithm 1 FISTA implementation 
Require: 

A dictionary 4>. 

The implicit measurement/degradation operator A. 
The original visibility data V. 

A detection level k. 

The soft-thresholding operator: 

SoftThresh A (a) =prox A (a) = ((l - |Jjy) + a,) ^ 

1: Initialize a (0) =0, t (() *=l, x*°'=0 
2: for n = 0 to /V max — 1 do 
3: P (n+1) =xW+n<t> T A T {V 

4: x^ !+1) =SoftThresh AlA/ j3 ( ' ,+1) 

5: ?(»+!) = (1 + -y/l +4(f(")) 2 )/2 

6; y(«+l) = (f(») _ 1 )/f(" +1 ) 

7: a(" +1 ) = x (' ,+1) + y(" +1 ) ( x (" +1 ) — xW) 

8 : end for 
9: Return: image 


robust noise estimator and taking A; = nOj = n M ^ [231, n embodying a detection level in this 
context. 

4.2 Numerical tests 

In flTOl . we performed a series of tests to validate the quality of the image reconstruction. Astro¬ 
nomical images can be used for science if the photometry and the astrometry are accurate. We 
tested the reconstruction using two single point sources (§ |4.2.1| ), a grid of point sources in a wide- 
held (§ |4.2.2| ) and testing on simulated and real LOFAR data containing extended radio emissions 

(§[rop. 

4.2.1 Angular resolution 

We generated different LOFAR datasets containing two 1-Jy point sources at various angular dis¬ 
tance ranging from 30” to 30’. We jointly used the Cotton-Schwab CLEAN 11531] to produce 
CLEAN images and SASIR, both inside AWimager to produce the images. Results presented in 
[TTOl demonstrated the capability of SASIR to image with super resolution giving a factor of «2 
improvement as compared to the size of the CLEAN beam, on the same data and same weighting 
scheme. To beat the A/D limitation imposed by the sampling (and weighting) of the data in the 
(u,v) plane, the inpainting algorithm could restore unmeasured high spatial frequencies in the im¬ 
age. The maximum achievable resolution is therefore dependent on the size of the Fourier support, 
which is defined by the pixel size on the sky and the number of pixels in this support. We also 
compared the recovered flux density of the point sources, their astrometric errors in the presence of 
different level of noise (from a SNR of 3 up to 2000). Super-resolution is noticeable in a high SNR 
regime but the effective angular resolution falls back to that of CLEAN in the low SNR regime. 
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5 Input flux density (Jy) 

Figure 2. (Top) Total flux density reconstmction for a set of point sources with original flux density span¬ 
ning over 4 orders of magnitude with the original flux density (x-axis) vs. the recovered flux density (y-axis). 
(Bottom) scatter plot of the absolute error for each source. The recovered flux densities for Cotton Schwab 
CLEAN (red) and SASIR (blue) are represented on a linear scale, whereas the absolute error is on a loga¬ 
rithmic scale for clarity. Perfect reconstruction lies along the black line. Adapted from HT51 1. 

4.2.2 Photometry 

We created a sky model containing a rectangular grid of lOx 10 point sources, distributed evenly in 
a field of 8° x 8° with flux density values ranging from 1 to 10 4 Jy. After the image reconstruction, 
we used PyBDSM 8 as a source finder to locate and match the sources between the CLEAN and the 
Sparse reconstruction, as well as with the original sky model. The source finder could also give 
a measure of the flux densities of the detected source for each method. Fig. |2j shows the output 
flux densities versus the input flux densities of the sky model for CLEAN (in red) and SASIR (in 
blue). The photometry using SASIR is comparable to that of CLEAN. The latter being particularly 
well fitted for point sources. We expect no major improvement of the flux density with SASIR. 
Nonetheless, the difference can still be drawn while imaging extended emission data. 

4.2.3 Extended emission 

We simulated a Measurement Set containing a sky model of the W50 super nova remnant adapted 
from high frequency radio maps 0531]. As the emission is extended, we produced images using 
Cotton-Schwab CLEAN and Multiscale CLEAN. The sparse reconstruction with SASIR enabled 
a reconstruction with higher effective angular resolution and a lower level of the reconstruction 
residuals (by a factor of « 5). As a final test, we used a calibrated LOFAR dataset containing real 
data taken on Cygnus A, a strong radio source presenting two bright radio lobes at low frequencies. 

8 See http://www.lofar.org/wiki/doku.php for more information. 
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Figure 3. Reconstructed images of Cygnus A from a real LOFAR observation using Cotton Schwab CLEAN 
(left column), MS-CLEAN (middle column), and the sparse reconstruction with SASIR (right column). First 
row are the restored images and second row are the residual images. The sparse reconstruction presents a 
higher angular resolution and a lower residual level than that obtained with the two other methods. Adapted 
from 111611 . 

The sparse reconstruction came with higher resolution and better reconstruction residuals as com¬ 
pared to Cotton-Schwab CLEAN and MS-CLEAN. We present on Fig. ^ the reconstruction and 
residuals for the different methods. In lITHl . the LOFAR map obtained at 150 MHz with SASIR 
was compared to radio contours obtained with the VLA at twice the frequency (325 MHz). The 
recovered super-resolved structures match real structures seen in the VLA radio map which shows 
that SASIR is able to recover missing spatial information with high fidelity. 

5. Conclusions and Future developments 

Sparse representation, convex optimization and proximal theory offer a new framework to develop 
advanced tools for radio imaging using aperture synthesis. Tests and developments of these new 
kind of radio imagers are on going and are developing in the context of LOFAR and SKA. We pre¬ 
sented with SASIR, an example of solving an inverse problem through inpainting in the visibility 
plane. Sparse reconstruction with SASIR is now being tested on various objects such protoplan¬ 
etary disk imaging, planetary radio emissions, transients and most recently in the scope of radio 
weak lensing studies H551 , Fhl l in the context of SKA H57H . In cosmology, blobs of dark matter are 
known to induce shearing on the image of galaxies and there is scope for evaluating the quality of 
the galaxy shape fitting on sparse reconstructed images and compare the results to that of CLEAN. 
Future implementations of the sparse reconstruction will extend to a third dimension to focus on 
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transient detection and spectral imaging. Sparse methods provide robust and realistic images from 
sparsely sampled data. This sparsity can be used on puipose in the design of an array (e.g. lF%ll ). 
Instruments such as the SKA will provide a tremendous amount of redundant data that are dif¬ 
ficult to calibrate. By applying the new framework of Compressed Sensing in the hardware part 
(by sensing randomly or using random antenna configurations), it could enable the implementation 
of fast imaging modes which partly alleviates the problem of data storage, which is particularly 
heavy when the data to store are raw visibilities. These new methods can be applied, on a longer 
term, to instrumental calibration directly in order to deduce relevant instrumental parameters from 
sparse data. The use of these methods is only at its beginning in (radio)astronomy, but its potential 
scientific benefits could represent a big leap forward in the field of astronomical data processing 
and data imaging. 
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